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ABSTRACT 

The Learning Style Inventory (LSI; Kolb, 1976; 1985 ) is a 
commonly used measure of learning styles based on Kolb' s Experiential Learning 
Model. The psychometric soundness of LSI scores has been critiqued 
historically. This study reviewed the literature on the LSI and evaluated the 
psychometric properties of Kolb's original and revised versions of the LSI. 
Researchers identified 110 articles that used the LSI. Fifty-nine articles 
made no mention of reliability, and slightly fewer than a third of these 
reported reliability for the obtained scores appropriately. Findings indicate 
that continued use of the LSI should be considered questionable. Reliabilities 
can vary as researchers administer the instrument across different settings. 
Thus reliability generalization may be warranted to examine score reliability 
meta-analytically across studies. Based on the work of L. Vacha-Haase, this 
study discusses the possibility of examining the variance of measurement error 
across studies as part of the literature review. (Contains 2 tables and 68 
references.) (Author/SLD) 
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Abstract 

The Learning Style Inventory (LSI) is a commonly employed 
measure of learning styles based on Kolb's Experiential Learning 
Model. Nevertheless, the psychometric soundness of LSI scores 
has historically been critiqued. The present article reviews the 
literature and critically evaluates the psychometric properties 
of Kolb's original and revised versions of LSI. Reliabilities 
can vary as researchers administer the instrument across 
different settings. Thus, reliability generalization (RG) may 
be warranted to meta-analytically examine score reliability 
across studies. Based on Vacha-Haase (1998), this study will 
discuss the possibility of examining the variance of measurement 
error across studies as part of the literature review. 
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A Critical Review of the Literature on Kolb's Learning Style 
Inventory with Implications for Score Reliability 

The study of individual learners' preferences or styles is 
an appealing concept for educators (cf., De Bello, 1990) . Many 
researchers make an a priori assumption that learning style is 
measurable (e.g.. Cross, 1976; Keefe, 1979), and a number of 
theories and resulting instruments have been developed (De 
Bello, 1990) . Researchers have used these theoretical frameworks 
and inventories in diverse disciplines and have attempted to 
correlate learning style or preference with many other variables 
(Geller , 1979) . 

Kolb's Experiential Learning Model (ELM) 

Historically, one of the more popular theoretical models of 
learning style has been Kolb's (1976) ELM. The ELM depicts 
learning as a cyclic process involving four modes: (a) concrete 

experience (CE) , (b) reflective observation (RO) , (c) abstract 

conceptualization (AC) , and (d) active experimentation (AE) . 
According to the theory, the effective learner typically 
participates in new experiences (CE) and then reflects upon 
these experiences (RO) in order to develop informal theories 
(AC) . Then, the learner uses these theories to make decisions 
or solve problems (AE) . 

Kolb (1976) further proposed that CE and AC, as well as RO 
and AE, represented polarized abilities that lie on different 
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ends of a continuum. These two dimensions were also 
hypothesized to be orthogonal. Although the ideal learner 
integrates and utilizes all four abilities, the average learner 
favors one ability on each dimension. Consequently, from the 
combination of an individual's ability on abstractness over 
concreteness (AC-CE) and action over reflection (AE-RO) , an 
individual is assigned to one of four learning styles: (a) 

Assimilator (AC and RO) , (b) Converger (AC and AE) , (c) 

Accommodator (CE and AE) , or (d) Diverger (CE and RO) . The 
reader is referred to Atkinson (1991), Kolb (1974, 1976, 1985), 

and Pickworth and Schoeman (2000) for broader discussions of the 
ELM. 

The Learning Styles Inventory (LSI) 

To operationalize his theory, Kolb (1976, 1985) developed 

the LSI as a measure of learning style, which enabled the 
classification of individuals into of the four dominant styles 
noted above. The LSI is one of the more commonly used 
instruments in this area continues to be employed in recent 
years (cf . Chou & Wang, 1999; Geiger & Boyle, 1992; Yuen & Lee, 
1994) . 

Based largely on the work of Kolb himself, Geller (1979) 
noted that even early on: 

The inventory has been used to examine relationship between 
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learning style and age (Kolb, 1971, 1976), sex (Kolb, 

1976), educational level (Kolb, 1971, 1976), undergraduate 

major (Kolb, 1971, 1974, 1976), creativity (Kolb, 1976), 

personality (Kolb, 1976), occupation (Kolb, 1971, 1976), 

career choice (Kolb, 1976; Kolb & Fry, 1974; Plovnick, 

1975; Sadler, Plovnick, & Snope, 1978; Wunderlich & Gjerde, 
1978), career-choice influences (Plovnick, 1975; Wunderlich 
& Gjerde, 1978) , approach to management education (Kolb, 
1974) , creating and maintaining an effective learning 
organization (Kolb, Rubin, & McIntyre, 1971) , communication 
among different functional units in an organization (Kolb, 
1974) , and preference for a particular instructional method 
or learning situation (Kolb, 1976; Sadler, Plovnick, & 
Snope, 1978; Whitney & Caplan, 1978) . (p. 556) 

With a more recent revision of the LSI (Kolb, 1985) , the 
inventory has enjoyed a relatively long tenure of use. However, 
as noted below, the LSI has also been severely criticized 
regarding its psychomtric properties. 

Original/revised versions of the LSI 

The first formal version of the LSI appeared in 1976 (Kolb, 
1976)-; the inventory was revised in 1985 (Kolb, 1985) . The 
original LSI (1976) consisted of nine items of four words 
representing each experiential style. Respondents rank order 



their preferences concerning the four words in each row that 
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corresponded to Kolb's four learning styles: CE, RO, AC, and AE . 
The original LSI used only six items in each column and three 
items per column served as distracters and were omitted from 
scoring . 

The 1976 version was subject to psychometric critique that 
largely centered poor score reliability (see e.g., Geller, 1979; 
Wilson, 1986) . Kolb (1985) therefore revised the format and 
scoring of the instrument, resulting in twelve rows of four 
sentence completion items that related to the four learning 
styles. Respondents again rank order their preferences on the 
four sentences in each row from 1 to 4 . Unlike the prior 
version, all 12 items are used in scoring with no distracters. 
Further, each column represents a single style (i.e., CE, RO, 

AE, AC) , leading some to suggest the risk of a response-set bias 
(Atkinson, 1988, 1989; Ruble & Stout, 1990, 1991; Sims, Veres, 
Watson, & Buckner, 1986; Veres, Sims, & Shake, 1987). 

In spite of apparent face validity and frequency of use, 
both versions of the LSI have been attacked as regards the 
validity and reliability of their scores. Previous measurement 
studies have addressed several psychometric problems such as the 
use of ipsative scoring (cf. Merritt & Marshall, 1984), 
questionable factor structure (cf. Geiger, Boyle, & Pinto, 1992, 
1993), response-set bias (cf. Ruble & Stout, 1994), and 
reliability and validity (cf . Atkinson, 1991) . 
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Issues with ipsativity 

Cattell (1944) coined the term "ipsative", referring to 
"measures that can be meaningfully interpreted 

intra individually, as contrasted with 'normative' measures that 
can be interpreted inter individually" (Pedhazur & Schmelkin, 

1991, p. 21). Essentially, ipsative measures require respondents 
to rank order responses, thus representing ordinal data that 
does not contain information regarding magnitude between 
observations. Both versions of the LSI use a rank order format 
for rating preferences for words (1976) or sentences (1985) . 

Importantly, the ranking is not an ordering of individuals , 
on a trait (e.g., highest in concreteness, next, and so on), but 
rather is performed within the individual as respondents rank 
sets of items (i.e., CE, RO, AE, AC) from 1 to 4 . Accordingly, 
responses to one item will necessarily be dependent on responses 
to other items in the set. Furthermore, the ipsative nature of 
this ranking creates artifactual negative correlations among 
measured attributes, because when a person ranks one attribute 
as 1, other attribute ranks must be higher than 1. Of course, 
the converse of this would be true as well, creating a situation 
where low scores on one attribute tend to correspond to higher 
scores on the other attribute. Table 1 provides a heuristic 
example (adapted from Cornwell and Dunlap [1994, p. 91]) of this 
problem for five subjects across the four scales in the LSI and 
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illustrates the tendency for the negative interdependence of the 
correlations . 



INSERT TABLE 1 ABOUT HERE 

Reliability Issues 

Because ipsative scores are interdependent, they have 
limited value for many psychometric purposes. As noted, 
artifactual negative interdependence is a function of the 
scoring method. This limits the factorability of ipsative scores 
and can yield artificial bipolar factors, such as those proposed 
by Kolb (1976, 1985) (i.e., AC-CE and AE-RO) . Accordingly, the 

validity of LSI scores has been questioned (Atkinson, 1991; 
Cornwell, Manfredo, & Dunlap, 1991; Ruble & Stout, 1994; Wilson, 
1986) . Cornwell and Dunlap (1994) and Hicks (1970) provide 
useful summaries of the limitations of ipsative scores. 

Importantly, the 1976 LSI did not use all items in the 
final scoring due to the inclusion of distracters; the 1985 LSI 
scored all items. Some authors have therefore characterized the 
1976 version as partially ipsative and the 1985 version as 
completely ipsative (cf . Ruble & Stout, 1994) . Logically, the 
negative interdependence noted above would be most pronounced in 
fully ipsative data (Hicks, 1970) . 

The LSI has also been challenged on reliability grounds. 
Atkinson (1991), Geller (1979), Pickworth and Schoeman (2000), 
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Ruble and Stout (1990, 1994), Sims et al . (1986), and others 

have discussed the historical reliability of LSI scores for both 
versions. The notable number of published psychometric reviews 
speaks to both the wide use of the LSI and the debate 
surrounding its measurement quality. 

The 1976 LSI appeared to yield scores with marginal 
internal consistency and poor test-retest reliability. Scores 
from the 1985 LSI appeared to have stronger, perhaps acceptable, 
internal consistency but continued to have poor, perhaps even 
worse, temporal stability. However, the interpretation of LSI 
score reliabilities is confounded with the ipsative nature of 
the scoring. Tenopyr (1988) demonstrated the artif actual 
reliability possible for multiple forced choice scales. Ruble 
and Stout (1994) further argued that the internal consistency 
improvement for 1985 LSI scores was inflated due to the fully 
ipsative nature of the scoring, as against the 1976 version 
which was only partially ipsative. 

Another possible reason for reliability inflation in 1985 
LSI scores is response bias (Atkinson, 1991; Sims et al . , 1986; 
Wilson, 1986) . For the LSI, response bias may be caused by the 
simplified scoring format for the same learning mode in the same 
column. Wilson (1986) examined this possibility with three 
different LSI versions: standard items, randomized items, and 
elaborated items. Wilson noted that the randomized and 
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elaborated versions produced less reliable scores than the 
standard version for both test-retest stability and internal 
consistency. He suggested that correlation for standard version 
might be inflated by response bias. 

Validity Issues 

Several studies have assessed construct validity of LSI 
scores using factor analysis. Factor analysis examines the 
internal structure of an instrument, which is relevant to the 
assessment of construct validity (Nunnally & Burstein, 1994; 
Thompson & Daniel, 1996) . Kerlinger suggested that the 
misunderstanding of ipsative measures might lead to false 
interpretation of factor analysis. The ipsative format of the 
LSI can cause spurious negative correlations among the items and 
distort factor analysis results. 

Kolb (1976b) proposed a bipolar two-factor structure in his 
ELM. Extant factor analytic studies provide confused results 
about these bipolar dimensions in the LSI. Given Kolb's theory, 
factor analysis should not extract four distinct factors (i.e., 
one for each style) but two orthogonal factors (i.e., one for 
each dimension) . The extraction of four distinct factors 
suggests that the learning abilities are independent. Unless 
the two bipolar factors are the result of spurious negative 
intercorrelations caused by ipsative scales, a two-factor 
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solution would support two bipolar dimensions of learning 
proposed by the ELM while four independent factors would not. 

Certo and Lamb (1980) compared the ipsative scales with 
normative (Likert) scales in their study. The ipsative version 
provided a two- factor structure, but the normative version did 
not. Further, Merritt and Marshall (1984) found a four- factor 
structure with the normative instrument, rather than the two 
bipolar dimensions posited by the ELM. They concluded that the 
normative form supports construct validity of the LSI. 

Cornwell, Manfredo, and Dunlap (1991) provided both two- 
factor and four-factor solutions, with results unsupportive of 
Kolb's two bipolar dimensions. Geiger, Boyle, and Pinto (1992) 
also provided two-factor and four-factor solutions. In the two- 
factor solution, CE and RO items tended to weight together, as 
did AC and AE items. In the four-factor solution, 

Geiger, Boyle, and Pinto (1993) used the standard LSI 
(ipsative format) of the 1985 version and a modified version 
(normative format) . In the two-factor structure, CE items and 
RO items tended to weight together, while AC items and AE items 
tended to weight together. In the four- factor structure, only 
the AC items weighted as a distinct factor. Their results did 
not support the hypothesized bipolar dimensions. 

According to the bipolar assumptions of the ELM, the 
opposite scales (i.e., CE with AC and RO with AE) should have 
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strong negative correlations with each other and the orthogonal 
( "not-opposite" ) scales (i.e., CE with RO, AE with AC, CE with 
AE, and RO with AC) should have zero (or near zero) 
correlations. However, previous studies have observed negative 
correlations of a given style with other "non-opposite" styles 
(Highhouse & Doverspike, 1987; Ruble & Stout, 1990; Smith & 

Kolb, 1986) . Thus, the pattern of intercorrelations of scores 
from the revised 1985 version of the LSI tends not to support 
the bipolar structure of the ELM. 

Implications for Score Reliability 
Although several reviews exist that examine the reliability 
of LSI scores, most reviews do not simultaneously address 
differences in score reliabilities from both versions of the LSI 
as well as other modified versions. Further, none of the reviews 
identified study features that may be predictive of reliability 
variation across studies. We therefore examined the extant 
literature using the LSI to characterize the variation of 
measurement error across administrations of the LSI . 

Method 

Sample of Articles 

Searches of the ERIC and PsycINFO databases using the 
keywords "Learning Style Inventory" and "LSI" were conducted. 
Only published articles were retained which left 127 ERIC and 
199 PsycINFO articles. After eliminating duplicates between the 
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databases, 290 articles remained, of which 174 were false hits 
(i.e., did not address the LSI) and 11 were theoretical. These 
were also eliminated leaving 105 articles. An additional five 
articles were added to this pool as a result of secondary 
identification of articles by backtracking references in the 
articles originally noted in the database searches. This left a 
final pool of 110 articles that employed the LSI. Each article 
was read and placed into one of several categories. 

Fifty-nine (53.6%) articles made no mention of reliability. 
Fifteen (13.6%) articles "inducted" (Vacha-Haase , Kogan, & 
Thompson, 2000) reliabilities by citing coefficients from prior 
studies or the test manual. Two articles reported reliability 
for data in hand but not in a usable format (e.g., reported a 
range of coefficients) . A little less than a third (n=34, 30.9%) 
of the articles appropriately reported reliability for the 
obtained scores. However, many of these articles reported 
multiple reliability estimates, leaving a sample of 206 internal 
consistency and 182 test-retest coefficients across the various 
subscales and dimensions of the LSI . 

Reliability generalization 

Reliability generalization (RG) is a meta-analyt ic 
technique to characterize (a) the measurement error variance for 
a given test across studies, (b) the amount of variability in 
reliability coefficients for given measures, and (c) the sources 
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of variability in reliability coefficients across studies 
(Vacha-Haase , 1998). The present paper only examined 
variability of reliability estimates across studies. Study 
features that are predictive of reliability variation are 
reported elsewhere (Henson & Hwang, in press) . 

Results and discussion 

Descriptive statistics (see Table 2) indicated considerably 
larger mean coefficient alphas for the 1985 version scores as 
compared to scores on the original 1976 form. For test-retest 
reliability, however, the 1985 form scores performed slightly 
worse than those from the 1976 test, and 1985 revisions yielded 
scores that did much better. These findings are consistent with 
prior studies (cf. Atkinson, 1991; Geller, 1979; Pickworth & 
Schoeman, 2000; Ruble & Stout, 1990, 1994; Sims et al . , 1986) . 

It is clear that the 1985 version of the LSI yielded more 
reliable scores as regards internal consistency. However, scores 
from the revision gave slightly lower test-retest coefficients. 
Thus, the apparent improvement in internal consistency was not 
matched by a corresponding improvement in temporal stability. As 
the standard deviations in Table 2 demonstrate, the measurement 
error possible in LSI scores can be considerable. 

At a minimum, researchers ought to examine reliability for 
their LSI scores and interpret effects in light of reliability 
(Wilkinson & APA Task Force on Statistical Inference, 1999) . 
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However, the lack of reliability in LSI scores is substantial 
enough to warrant either (a) discontinuation of use or (b) 
considerable revision of the instrument. Indeed, several authors 
have called for the abolition of the LSI due to its psychometric 
infirmities (see e.g., Atkinson, 1991; DeCoux, 1990; Ruble & 
Stout , 1994 ) . 

The current results, however, suggest that some promise may 
be found in studies (cf. Pickworth & Schoeman, 2000) revising 
the 1985 form in various ways (e.g., use of normative rather 
than ipsative scaling) . The mean score reliabilities for the 
1985 revisions (see Table 3) are marginal for internal 
consistency (although much improved over the 1976 form) and 
strong for temporal stability. Perhaps the future of the LSI 
lies with continued revision. The current results would indicate 
that the LSI's past is sufficiently storied to preclude future 
use, particularly when one considers that reliability is a 
necessary but insufficient condition for validity. 

In sum, the current findings indicate that continued use of 
the LSI should be considered questionable at best . Despite prior 
psychometric reviews with similar outcomes (Atkinson, 1991; 

Ruble & Stout, 1990, 1994), the LSI has enjoyed continued use in 
the literature. As explained by Atkinson (1991) : 

Considering the popularity of the instrument, face validity 
may have been what has kept practitioners and researchers 
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returning to the LSI. While authors like Freedman and 
Stumpf (1980) acknowledged the face validity of the LSI, 
they proposed, as did others, that what meets the eye may 
be less than the beholder suspects. ...Continued 
applications of the LSI-1985 seem warranted for dialogic , 
rather than diagnostic , purposes as long as the user is 
mindful and open about the instrument's apparent 
limitations. ...Heretofore, it seems face validity has been 
the saving grace of the LSI... (pp. 158-159, italics in 
original ) 

Unfortunately for the LSI, face validity is insufficient 

psychometric evidence for most applications. 
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Table 1 

Illustrative Responses and Scale Intercorrellat ions for Five 



Subjects 


on 


LSI Scales 














Subject 

No. 


Ipsative 


Rankings 




Correlations 




CE 


RO 


AE 


AC 


Scale 


CE 


RO 


AE 


AC 


1 . 


1 


2 


3 


4 


CE 


1.00 








2 . 


2 


1 


4 


3 


RO 


- .84 


1.00 






3 . 


3 


1 


2 


4 


AE 


- .28 


.00 


1.00 




4 . 


1 


3 


2 


4 


AC 


.25 


- .28 


- .84 


1.00 


5 . 


1 


3 


4 


2 













Note . Subject responses are ranked, 1 to 4, across the four 
attributes . 
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Table 2 

Descriptive Statistics for Coefficient alpha and Test-retest 
Reliabilities for Various Test Forms by LSI Subscale. 



Coefficient alpha 



1976 version 1985 version revised 1985 version 



M 


.420 


Concrete Experience 
. 809 


.680 


SD 


. 112 




.045 


.090 


N 


10 




26 


15 


M 


. 602 


Reflective Observation 
.812 


. 707 


SD 


.095 




.034 


.048 


N 


10 




26 


14 


M 


.489 


Active 


Experimentation 
. 843 


.666 


SD 


. 181 




. 033 


.122 


N 


10 




26 


14 


M 


.635 


Abstract 


Conceptualization 
. 830 


. 763 


SD 


.094 




.025 


.056 


N 


10 




26 


15 



Test-retest 



1976 version 1985 version revised 1985 version 



M 


.460 


Concrete Experience 
.312 


.877 


SD 


. 095 




. 120 


.225 


N 


11 




20 


7 


M 


. 515 


Reflective Observation 
.472 


. 914 


SD 


. 142 




. 136 


. 135 


N 


11 




20 


7 


M 


.450 


Active 


Experimentation 
. 515 


. 904 


SD 


. 109 




. 129 


. 145 


N 


11 




20 


7 


M 


. 581 


Abstract 


Conceptualization 

.486 


. 917 


SD 


. 091 




. 091 


. 145 


N 


11 




20 
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